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Abstract. This paper evaluates methods of hierarchical skill analysis 
developed in aerospace to the problem of surgical skill assessment and 
modeling. The analysis employs tool motion data of Fundamental of La¬ 
paroscopic Skills (FLS) tasks collected from clinicians of various skill 
levels at three different clinical teaching hospitals in the United States. 
Outcomes are evaluated based on their ability to provide relevant infor¬ 
mation about the underlying processes across the entire system hierarchy 
including control, guidance and planning. 


1 Introduction 

Over 32,000 deaths and $9B in losses are annually attributed to avoidable surgi¬ 
cal errors [1], highlighting the need to ensure quality by mandating proficiency 
benchmarks in standardized training and credentialing [2] of surgical trainees. 
Modern procedural trainees and graduates prove inadequately prepared and in 
need of additional training |3)4] . Yet this is not viable with the faculty-intensive 
and time-demanding, subjective methods in use today, particularly given the 
steady influx of ever-changing technologies into the operating room. Automated, 
objective assessments of skill are needed. To ultimately address such a broad is¬ 
sue, simple performance scores on tasks such as task time or path length will 
not suffice, the entire, complex spectrum of human skill must be treated. 


1.1 Surgical skill analysis 


Prior work }5l6j has made considerable advances in task segmentation and skill 
classification for surgical contexts, particularly |7l8l9llQflT] . These approaches 
tend to focus on a specific subtask or modality (e.g., robotics vs. manual la¬ 
paroscopy). While they succeed in providing valuable metrics to discriminate 
skill or procedural context, they do not extend directly to hierarchical human 
skill constructs like perception, planning, and cognition that are ultimately vi¬ 
tal to this area [12]. No comprehensive framework exists that can successfully 
tie these many disparate attributes. Universal metrics of skill, proposed in m , 
provided an early approach to such task-agnostic metrics and produced datasets 
to ultimately evaluate such metrics, however, it yielded little progress towards 



such a goal. We herein introduce a different approach based on invariants. The 
approach makes it possible to delineate between key processes of the hierarchical 
control and sensory system. We provide preliminary evaluation for laparoscopy. 


1.2 Alternative Skill Model and Analysis Framework 

More recently researchers have grown interested in a more formal dynamics and 
control based theory of perception and action. The notable examples include 
Warren’s control theory of dynamics of action and perception mm . More com¬ 
prehensive models that capture the closed-loop interaction have been proposed 
in the aerospace field in the form of multi-loop models. The loops are organized 
hierarchically starting with the low-level attitude stabilization, to tracking, and 
ultimately goal directed maneuvering I16H7I181 . These models and efforts sug¬ 
gest that comprehensive skill evaluation requires accessing and using information 
across the different levels of the system hierarchy; not just the performance “out¬ 
puts” but also the various internal processes and if possible should encompass 
the “inputs” to the system such as perception and attention. 

The multi-loop framework provides a rigorous, deterministic basis for mea¬ 
surement, evaluation and modeling of skills. Figure [l] shows the primary loops 
in a multi-loop model m ■ This hierarchical multi-loop model suggests that op¬ 
erators learn feedback structures across multiple levels. 



Fig. 1. Hierarchic multi-loop model of human guidance behavior. The top level de¬ 
scribes the planning level based on the decomposition of the task and environment in 
terms of interaction patterns. The plan is codified based on a subgoal sequence g &. The 
currently active subgoal defines the reference for the perceptual guidance. The latter 
extracts the current motion gap which is used to determine a state reference trajectory 
x re f. At the lowest level, a tracking feedback system implements the desired motion. 






























The concept of “interaction pattern” is introduced in [2d following inves¬ 
tigation of human guidance behavior using experiments with miniature remote 
control helicopters. These patterns are based on invariants of the closed-loop 
interactions. The significance of these invariants is that they describe what prin¬ 
ciples human pilots or operators use to break down complex guidance problems 
into a sequence of smaller, tractable ones. Preliminary studies based on piecewise 
affine (PWA) model identification methods suggest that the equivalence classes 
can be further decomposed into distinct dynamic modes, which provides deeper 
insight into lower-level control strategies. The higher-level interaction patterns 
combined with the lower-level dynamic modes provide the building blocks needed 
to codify the behavior across the entire hierarchy, from the lower-level control, 
guidance and perception, all the way to higher-level planning, adaptation and 
learning [19]. This paper investigates how this framework can be used for surgical 
skill evaluation. 


2 Experimental Setup 

This study employed the dataset collected in |2l] which used the Electronic Data 
Generation and Evaluation (EDGE) platform (Simulab Corp. Seattle, WA), Fig. 
|2(a)| This consists of 22.7 hours of synchronized video and tool motion data 
of Fundamental of Laparoscopic Skills (FLS) tasks collected from clinicians of 
various skill levels at three different clinical teaching hospitals in the United 
States. FLS has been shown to correlate to operating room performance [22] . 
We herein incorporate only a small part of this data that also provides ratings 
by faculty clinicians via blinded video review to establish valid categories of skill. 



(a) EDGE (b) Peg Transfer Task 


Fig. 2. The EDGE platform (a) and screen shot of the FLS Peg Transfer task (b) used 
for this work. 








2.1 Task Description 


In this paper, we use only the Peg Transfer task where clinicians use Maryland 
Graspers to transfer blocks in minimal time and with minimal drops. The blocks 
must be picked up by one hand with a laparoscopic tool and then transferred 
mid-air to the other hand tool. 


2.2 Data Overview and Group Selection 

The motion data (tool tip position, orientation, grasp angle and grasp force 
for both hands) sampled at 30Hz. Three skill groups were selected based on a 
combination of criteria in Table [T] Complete details are available in HSj. 

A set of six complete Peg Transfer task instances were arbitrarily selected 
from unique subjects among the three geographically distinct sites to represent 
each of the three skill groups. 


Table 1 . Summary of criteria used to select each set of iterations and its intended 
purpose. N refers to the total count of iterations of each set. 


Group N Criteria 

Expert (Exp) 6 Practicing laparoscopists (over 100 
lapr. procedures): surgeons’ and fel¬ 
lows’ best FLS-scoring logs with 3/5 
or greater average OSATS video re¬ 
view scores. 

Intermediate (Int) 6 15th percentile of FLS scores about 
midpoint FLS score determined be¬ 
tween lowest Expert FLS score and 
highest Novice FLS score [.59, .73]. 

Novice (Nov) 6 All logs below 15 t/l -percentile FLS 
score. 


3 Surgical performance analysis 

3.1 Task-Level Statistical Analysis 

The objective of statistical analysis at the task level is to provide general charac¬ 
teristics of operator skill in an intuitive context. A mapping of the surgical move¬ 
ments into a probability distribution in the speed-curvature space was adopted 
as initial technique to assess general characteristics of operator’s performance. 
This analysis provides both the maneuver envelope and the dominant states in 
the behavior [23]. The dominant states are defined to be the most frequently 





visited states which may serve as the transition quasi-equilibrium between ma¬ 
neuvers. The probability distribution of trajectory points in the speed-curvature 
plane is shown in Figure [3j The expert group exhibits a larger envelope and more 
condensed dominant states. 
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Fig. 3. 2D distribution of speed and curvature 


Probability distributions generated over the same event space can be com¬ 
pared via a symmetric Kullback-Leibler divergence D(jp\\q) = [Dkl{p\\q) + 
Dkl(q\\p))/^‘ A leave-one-(surgeon)-out permutation analysis was performed for 
all three skill groups. The resulting divergence among the populations of possible 
group probability distributions is shown in Table |2(a)| Each entry in the table, 
e m?n , is the mean and standard deviation of D(P grpm \ su i J j i \\Pg rpn \ su i J j j ),Vi ^ j. 
Low values along the diagonal indicate that group distributions remain con¬ 
sistent and remain distinct from other groups even if individual surgeons are 
removed, whereas the low standard deviations (in parenthesis) indicate little 
change in overall distribution due to the removal of a single surgeon. However, 
the classification power of such a distributional approach is poor, see Table [2(b) 
This shows how such broad statistical approaches succeed in providing general 
characteristics of operator behavior in an intuitive context (the task variables), 
however, such generalization is too broad to be used for classification of very 
specific runs or individuals. 

3.2 Kinematic classification 

The kinematic classification method introduced in [23] is based on the concept of 
control and attention workload. The method uses a library of motion primitives 
of different attention load levels. The assumption is that experts favor motions of 
low control and attention load. Simpler motions, such as rectilinear and uniform 
(non-accelerated) motion, are easier to implement and more predictable, and 
therefore demanding less attention. These motions would allow the maneuvers 
to be more efficient and more consistent in multi-trial operations. Moreover, 
given the limited information processing capacity of human, simpler motions 
would allow for more cognitive processes, in particular, planning and decision 
making. Parsing trajectories under human control into sub-level sequences of 
motion primitives provides insights into the organization principles which is an 











Table 2. Leave-one-(surgeon)-out (a) permutation analysis for symmetric KL- 
Divergences of group distributions and (b) cross-validation confusion matrix for clas¬ 
sification success for individual surgeons. 


(a) Permutation Analysis, mean (std dev) (b) Classifier Performance 
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essential aspect of human spatial control skills. The metrics derived from the 
segmentation, include the frequency of motion primitives and the mean duration 
of each segments. The kinematic classification results are shown in Figures [4] 
and[U 




(a) Trajectory (b) Ratio 

Fig. 4. Novice kinematic classification 
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Fig. 5. Expert Kinematic classification 


3.3 Dynamic clustering 

The dynamic clustering method is based on the assumption of hierarchic organi¬ 
zation of behavior [23?. It assumes that humans tend to adopt limited number of 
strategies in dealing with complex interactions taking place between organism, 
task and environment elements. The interaction involves the whole system of pro¬ 
cesses including perception, cognition and motor control. With extensive prac¬ 
tice, these interactions will exhibit patterns that are manifestation of processes 
used to reduce the attention load and facilitate the organization of behaviors. 
Therefore capturing and describing these patterns are significant to investigate 
skills across the comprehensive hierarchy of processes. 

In the dynamic clustering method, the dynamics of human-agent-environment 
interaction is described with Piece-Wise Auto-Regressive eXogenous (PWARX) 
model given in a parametric state-space form. Although the closed-loop inter¬ 
action dynamics is always non-linear, it assumes that each interaction pattern 
describes an invariant in human’s behavior that manifests as a quasi-equilibrium 
















in the dynamics. Therefore, the interaction patterns can be captured using a 
PWARX model in the form described in [23 and identified with different set of 
parameters: 
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where At is the sampling time |23 . Each interaction pattern is identified as a 
model of different set of parameters. The PWARX results are shown in Figures [6] 
and[7| where three clusters are identified for both novice and expert groups. To 
make the analysis more intuitive, the PWARX parameters are transformed into 
speed, normal and tangential acceleration in the ellipsoid plot. 



(a) Trajectory (b) Ratio 

Fig. 6. Dynamic clustering of novices 





(a) Trajectory (b) Ratio 

Fig. 7 . Dynamic clustering of experts 


4 Hierarchical Skill Assessment Results 

4.1 Subgoal closure 

As suggested in the introduction, dynamics-based clustering reveals spatial orga¬ 
nization abilities of expert surgeons in performing Peg Transfer task, and makes 
it possible to delineate the different phases of the task and therein analyze spe¬ 
cific performance characteristics. In Figure [7(a)| the spatial organization of the 
behavior of expert surgeons is closely correlated with the three phases in Peg 
Transfer task: 

1. Starting phase (cluster mode 2) coincides with the surgeons picking up the 
blocks. The movement during this phase follows a medium velocity range. 

2. Maneuvering phase (cluster mode 1) coincides with the surgeons moving the 
gripped blocks to the central area of the board. There is no restriction on 
the movement during the maneuvering phase, and the objective of the phase 
is to be as fast as possible. Therefore the surgeons adopt high velocity and 
the accelerations span a large range. 






















3. Interception phase (cluster mode 3) coincides with the blocks being trans¬ 
ferred in the air between two hands of laparoscopic tool. This phase is critical 
in that it requires a large amount of coordination effort for both hands. 

For each phase of the task, expert surgeons adopt very consistent strategy. 
In contrast, the maneuvers of novices are less consistent. During the starting 
phase, novices sometimes drive the control to high velocity, which penalizes the 
accuracy. The lack of consistency in the strategy also demands more attention 
load to plan for new trials and to handle the range of conditions. More attempts 
are required for novices to successfully pick up the blocks. In the maneuvering 
phase, the laparoscopic tool frequently slows to a lower velocity, penalizing the 
completion speed of the task. 

4.2 High-level planning 

To facilitate the accomplishment of complex task, humans divide the task into 
subtasks. In [20] . Kong and Mettler have shown that subtasks exploit invariants 
in the dynamic environment interactions. The invariants in the human behavior 
emerge through extensive practice ostensibly as a result of the assimilation of 
coordinated movement and perceptual processes in procedural memory. These 
interaction pattern can then be used as a unit of behavior for the larger orga¬ 
nization. High-level planning can therefore be assessed from the organization of 
interaction patterns. Effective planning allows using interaction patterns that 
take advantage of the dynamic interaction between human’s motor skills and 
task elements that also reduce the attention load. For this reason, spatial or¬ 
ganization of the interaction pattern is an important measure of the surgeons’ 
planning skill. To quantify the spatial organization, the Cartesian coordinates 
of trajectory points are classified using a Fisher classifier based on the tags ob¬ 
tained in PWA clustering. The misclassification ratio is then used as the measure 
of spatial organization, as shown in Table [3] 


Table 3. Skill metric in high-level planning 


Spatial organization [%] 

Expert 

Intermediate 

Novice 

Complete Groups 

Leave-One-Out Mean(std dev) 

17.9 

13.3 (4.4) 

29.6 

27.1 (5.6) 

38.6 

35.0 (6.5) 


5 Conclusion 


The results underscore the limitations of simple outcome measures such as those 
obtained from kinematic characteristics (see Section 3.1 and 3.2) and on the 
other hand demonstrates the discriminative power of dynamical characteristics 
obtained here using a PWARX model. The latter provides a more detailed seg¬ 
mentation and insights into the dynamic make-up of the behavior and their 







spatial organization. This more detailed information provides correlation with 
important procedural movement stages, yet it requires no prior, high-level knowl¬ 
edge about the task to be implemented. Finally, the spatial characteristics of the 
segmented performance data provides a measure of the ability to organize the 
different stages of behavior in a manner which is consistent with the spatial and 
dynamic constraints of the task and operator skills. These results demonstrate 
that dynamical segmentation techniques can access attributes across the en¬ 
tire process hierarchy and provide the foundation to a more comprehensive skill 
analysis and modeling framework. Future work will immediately extend this 
approach to more FLS tasks and ultimately to different tasks in laparoscopic, 
robotic, and open surgery and incorporate gaze characteristics. 
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